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Abstract 

The Nelder-Mead algorithm, a longstanding direct search method for unconstrained 
optimization published in 1965, is designed to minimize a scalar-valued function / of 
n real variables using only function values, without any derivative information. Each 
Nelder-Mead iteration is associated with a nondegenerate simplex defined by n + 1 ver- 
tices and their function values; a typical iteration produces a new simplex by replacing 
the worst vertex by a new point. Despite the method's widespread use, theoretical results 
have been limited: for strictly convex objective functions of one variable with bounded 
level sets, the algorithm always converges to the minimizer; for such functions of two 
variables, the diameter of the simplex converges to zero, but examples constructed by 
McKinnon show that the algorithm may converge to a nonminimizing point. 

This paper considers the restricted Nelder-Mead algorithm, a variant that does not 
allow expansion steps. In two dimensions we show that, for any nondegenerate starting 
simplex and any twice-continuously diffcrentiablc function with positive definite Hessian 
and bounded level sets, the algorithm always converges to the minimizer. The proof 
is based on treating the method as a discrete dynamical system, and relies on several 
techniques that are non-standard in convergence proofs for unconstrained optimization. 

1 Introduction 



Since the mid-1980s, interest has steadily grown in derivative-free methods (also called non-derivative 
methods) for solving optimization problems, unconstrained and constrained. Derivative-free methods 
that adaptively construct a local model of relevant nonlinear functions are often described as "model- 
based" , and derivative-free methods that do not explicitly involve such a model tend to be called 
"direct search" methods. See [5] for a recent survey of derivative-free methods; discussions focusing 
on direct search methods include, for example, [3lJ El QUI H31 [22] . 

The Nelder-Mead (NM) simplex method [20] is a direct search method. Each iteration of the 
NM method begins with a nondegenerate simplex (a geometric figure in n dimensions of nonzero 
volume that is the convex hull of n + 1 vertices), defined by its vertices and the associated values 
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of /. One or more trial points are computed, along with their function values, and the iteration 
produces a new (different) simplex such that the function values at its vertices typically satisfy a 
descent condition compared to the previous simplex. 

The NM method is appcalingly simple to describe (see Figure^]), and has been widely used (along 
with numerous variants) for more than 45 years, in many scientific and engineering applications. 
But little mathematical analysis of any kind of the method's performance has appeared, with a few 
exceptions such as [301 E3 (from more than 20 years ago) and (more recently) [9] . As we discuss in 
more detail below, obtaining even limited convergence proofs for the original method has turned out 
to be far from simple. The shortage of theory, plus the discovery of low-dimensional counterexamples 
(see (jl.ip ) have made the NM method an outlier among modern direct search methods, which are 
deliberately based on a rigorous mathematical foundation. (See, for example, [51 [U 03J [JJ , as well as 
more recent publications about direct search methods for constrained problems.) Nevertheless the 
NM method retains importance because of its continued use and availability in computer packages 
(see [23l [TT1 [7] ) and its apparent usefulness in some situations. 

In an effort to develop positive theory about the original NM algorithm, an analysis of its 
convergence behavior was initiated in [15] in 1998, along with resolution of ambiguities i nj20] about 
whether function comparisons involve "greater than" or "greater than or equal" tests |^| In what 
follows we use the term Nelder-Mead algorithm to refer generically to one of the precisely specified 
procedures in [T5]; these contain a number of adjustable parameters (coefficients), and the standard 
coefficients represent an often-used choice. For strictly convex objective functions with bounded 
level sets, |15j showed convergence of the most general form of the NM algorithm to the minimizer 
in one dimension. For the NM algorithm with standard coefficients in dimension two, where the 
simplex is a triangle, it was shown that the function values at the simplex vertices converge to a 
limiting value, and furthermore that the diameter of the simplices converges to zero. But it was not 
shown that the simplices always converge to a limiting point, and up to now this question remains 
unresolved. 

Taking the opposite perspective, McKinnon j!8j devised a family of two-dimensional counterex- 
amples consisting of strictly convex functions with bounded level sets and a specified initial simplex, 
for which the NM simplices converge to a nonminimizing point. In the smoothest McKinnon exam- 
ple, the objective function is 

(xy) _j 2400N 3 + y + y 2 if x<0 

when the vertices of the starting simplex are (0,0), (1,1) and ((1 + v / 33)/8, (1 — \/33)/8)). Note 
that f m is twice-continuously diffcrentiable and that its Hessian is positive definite except at the 
origin, where it is singular. As shown in Figure [TJ the NM algorithm converges to the origin (one of 
the initial vertices) rather than to the minimizer (0, performing an infinite sequence of inside 
contractions (see Section [2]) in which the best vertex of the initial triangle is never replaced. 

Functions proposed by various authors on which the NM algorithm fails to converge to a mini- 
mizer are surveyed in |18j , but counterexamples in the McKinnon family illustrated by (jl.ip consti- 
tute the "nicest" functions for which the NM algorithm converges to a non-stationary point. 

An algorithmic flaw that has been observed is that the iterations "stagnate" or "stall", often 
because the simplex becomes increasingly close to degenerate (as depicted in Figure [J). Previously 
proposed corrective strategies include: placing more restrictions on moves that decrease the size of 
the simplex; imposing a "sufficient decrease" condition (stronger than simple decrease) for accepting 
a new vertex; and resetting the simplex to one that is "nice" . See, for example, [351 1301 HH [HI (Ml 
[T9l [5J , a small selection of the many papers that include convergence results for modifications of 
Nelder Mead. 

Our object in this paper is to fill in additional theory for the NM algorithm in the two-dimensional 
case, which remains of interest in its own right. As noted by McKinnon [TBI P a g c 148], it is not even 
known whether the NM algorithm converges for the prototypically nice function f(x,y) — x 2 + y 2 . 



1 Resolution of these ambiguities can have a noticeable effect on the performance of the algorithm; see [8]. 



Figure 1: The NM algorithm's failure on the McKinnon counterexample (JTTTJ. 



Here we answer this question affirmatively for a simplified variant of the NM algorithm, where the 
simplification reduces the number of allowable moves rather than attempting to "fix" the method. 
In the original NM algorithm (see Section [3]), the allowable moves are reflection, expansion, outside 
contraction, inside contraction, and shrink; an expansion doubles the volume of an NM simplex, 
while all other moves cither leave the volume the same or decrease it. An expansion is tried only 
after the reflection point produces a strict improvement in the best value of /; the motivation is to 
allow a longer step along an apparently promising direction. The restricted Nelder-Mcad (RNM) 
algorithm defined in Section [2] docs not allow expansion steps. Thus we are in effect considering a 
"small step" NM algorithm. 

Our analysis applies to the following class of functions: 

Definition 1.1. Let T denote the class of twice-continuously diffcrentiable functions /: R 2 — > R 
with bounded level sets and everywhere positive definite Hessian. 

The class J 7 is a subclass of those considered in [15], where there is no requirement of differen- 
tiability 

The contribution of this paper is to prove convergence of the restricted Nelder-Mcad algorithm 
for functions in T: 

Theorem 1.2. (appears again as Theorem I3.17|) If the RNM algorithm is applied to a func- 
tion f G J 7 , starting from any nondegenerate triangle, then the algorithm converges to the unique 
minimizer of f. 

Remark 1.3. Thcorcm[L2] immediately implies a generalization to a larger class of functions. Namely, 
if / G J-, and g : M — > M. is a strictly increasing function, then the RNM algorithm applied to / := gof 
converges, because the RNM steps for / are identical to those for /. 

Remark 1.4. Because the NM iterations in the McKinnon examples include no expansion steps, the 
RNM algorithm also will fail to converge to a minimizer on these examples. It follows that, in order 
to obtain a positive convergence result, additional assumptions on the function over those in |15j 
must be imposed. In particular, the positive-definitcness condition on the Hessian in Theorem 11.21 
rules out the smoothest McKinnon example (jl.lj) , in which the Hessian is singular at the origin (the 
nonminimizing initial vertex to which the NM algorithm converges). 

An interesting general property of the Nelder-Mead algorithm is the constantly changing shape 
of the simplex as the algorithm progresses. Understanding the varying geometry of the simplex 
seems crucial to explaining how the algorithm behaves. Our proof of Theorem 11.21 analyzes the 
RNM algorithm as a discrete dynamical system, in which the shapes of the relevant simplices (with 
a proper scaling) form a phase-space for the algorithm's behavior. The imposed hypothesis on the 
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Hessian, which is stronger than strict convexity, allows a crucial connection to be made between a 
(rescaled) local geometry and the vertex function values. We analyze the algorithm's behavior in a 
transformed coordinate system that corrects for this rcscaling. 

The proof of Theorem l 1 . 2l cstablishcs convergence by contradiction, by showing that the algorithm 
can find no way not to converge. We make, in effect, a "Sherlock Holmes" argument: Once you have 
eliminated the impossible, whatever remains, however improbable, must be the truthjfl We show 
that, in order not to converge to the minimizcr, the triangles would need to flatten out according 
to a particular geometric scaling, but there is no set of RNM steps permitting this flattening to 
happen. This result is confirmed through an auxiliary potential function measuring the deviation 
from scaling. One can almost say that the RNM algorithm converges in spite of itself. 

2 The restricted Nelder— Mead algorithm 

Let / : R n — > M. be a function to be minimized, and let p l7 . . . , p n +i be the vertices of a nondegenerate 
simplex in R™. One iteration of the RNM algorithm (with standard coefficients) replaces the simplex 
by a new one according to the following procedure. 

One iteration of the standard RNM algorithm. 

1. Order. Order and label the n + 1 vertices to satisfy /(pi) < /(P2) < ■ ■ ■ < /(Pn+i)i using 
appropriate tie-breaking rules such as those in [15] . 

2. Reflect. Calculate p = ^" =1 p^/n, the average of the n best points (omitting p Jl+ i). Compute 
the reflection point p r , defined as p r — 2p — p n +i, and evaluate / r = /(p r ). If /r < fni accept 
the reflected point p r and terminate the iteration. 

3. Contract. If /,. > /„, perform a contraction between p and the better of p n +i and p r . 

a. Outside contract. If /„ < / r < f n +i (i.e., p r is strictly better than p n +i), perform an 
outside contraction: calculate the outside contraction point p ou t = h(P + Pr), and evaluate 
/out = /(Pout)- If /out < /r, accept p ou t and terminate the iteration; otherwise, go to Step [4] 
(perform a shrink). 

b. Inside contract. If / r > f n +i, perform an inside contraction: calculate the inside 
contraction point pi n = |(p + Pn+i)) and evaluate f ln = /(pi n )- If fin < fn+i, accept pi n and 
terminate the iteration; otherwise, go to Step 0] (perform a shrink) . 

4. Perform a shrink step. Evaluate / at the n points Vi = |(pi + Pi), i = 2, . . . , n + 1. The 

(unordered) vertices of the simplex at the next iteration consist of pi, V2, ■ • • , v n+ i. 

The result of an RNM iteration is either: (1) a single new vertex — the accepted point — that replaces 
the worst vertex p n +i in the set of vertices for the next iteration; or (2) if a shrink is performed, a 
set of n new points that, together with pi, form the simplex at the next iteration. 

Starting from a given nondegenerate simplex, let p[ , p„+i be the vertices at the start 
of the k th iteration. Let z € K" be a point. We say that the RNM algorithm converges to z if 
linifc^oo p| fe) = z for every ie{l,...,n+l}. 

Remark 2.1. In two dimensions, a reflect step performs a 180° rotation of the triangle around p, 
so the resulting triangle is congruent to the original one. But in higher dimensions, the reflected 
simplex is not congruent to the original. 

Remark 2.2. Shrink steps are irrelevant in this paper because we are concerned only with strictly 
convex objective functions, for which shrinks cannot occur (Lemma 3.5 of |15j). It follows that, at 
each NM iteration, the function value at the new vertex is strictly less than the worst function value 
at the previous iteration. 



2 A. Conan Doyle, "The Sign of the Four", Lippincott's Monthly Magazine, February 1890. 
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Figure 2: The five possible moves in the original NM algorithm are shown. The original simplex is sur- 
rounded by a dashed line, and its worst vertex is labeled P3. The point p is the average of the two best 
vertices. The shaded figures are NM simplices following reflection, expansion, outside contraction, inside 
contraction, and shrink, respectively. (In the "shrink" figure, the best vertex is labeled pi.) The "expansion" 
step is omitted in the RNM algorithm. 

Remark 2.3. The original Ncldcr-Mead algorithm differs from the above in Step 2. Namely, if p r is 
better than all n + 1 of the vertices, the original NM algorithm tries evaluating / at the expansion 
point Po := p + x(P Pn+i) f° r a fixed expansion coefficient X > 1, and the worst vertex p n +i is 
then replaced by the better of p e and p r . In fact, Nelder and Mead proposed a family of algorithms, 
depending on coefficients for reflection, contraction, and shrinkage in addition to expansion. A 
complete, precise definition of an NM iteration is given in [15], along with a set of tie-breaking rules. 
Instances of the moves in the original NM algorithm are shown in Figure [2j 

Remark 2.4. One feature of the RNM algorithm that makes it easier to analyze than the original 
algorithm is that the volume of the simplex is non-increasing at each step. The volume thus serves 
as a Lyapunov function[f| 

We henceforth consider the RNM algorithm in dimension two, for which it is known that the 
simplex diameter converges to zero. 

Lemma 2.5. Suppose that the RNM algorithm is applied to a strictly convex 2-variable function 
with bounded level sets. Then for any nondegenerate initial simplex, the diameters of the RNM 
simplices (triangles) produced by the algorithm converge to 0. 

Proof. The proof given in |15[ Lemma 5.2] for the original NM algorithm applies even when expansion 
steps are disabled. □ 



3 Convergence 
3.1 The big picture 

Because the logic of the convergence proof is complicated, we begin with an overview of the argument. 
Each / £ T is strictly convex, so by Lemma 12.51 we know that the evolution of any triangle under 
the RNM algorithm has the diameter of the triangle converging to zero. (We do not yet know that 
the triangles converge to a limit point.) The convergence proof proceeds by contradiction, making 
an initial hypothesis (Hypothesis 1 in Section I3.4[) that the (unique) minimizer of / is not a limit 
point of the RNM triangles. Under this condition, all three RNM vertices must approach a level set 



3 See Definition 1.3.4 in [27| page 23]. 
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corresponding to a function value strictly higher than the optimal value. By our assumptions on /, 
this level set is a strictly convex closed curve with a continuously diffcrcntiablc tangent vector. 

The RNM triangle must become small as it approaches this bounding level set. Therefore, from 
the viewpoint of the triangle, blown up to have (say) unit diameter, the level set flattens out to a 
straight line. The heuristic underlying our argument is that, in order for this to happen, the triangle 
must itself have its shape flatten out, with its width in the level set direction (nearly horizontal, as 
seen from the triangle) being roughly the square root of its height in the perpendicular direction. In 
particular, its width becomes proportionally much larger than its height. A local coordinate frame 
(Section 13. 3p is defined in order to describe this phenomenon. 

At the start of iteration k, we measure area and width in a local coordinate frame, and define a 
quantity called "flatness" by I\. := area fe /width^ . If a reflection is taken during iteration k and the 
same coordinate frame is retained, the area and width of the RNM triangle at iteration k + 1 remain 
the same, so Tk+i = IV Hence, in order for the diameter to converge to zero (Lemma 12. 5[) . there 
must be infinitely many contraction steps. We show that, at a sufficiently advanced iteration fc of 
the RNM algorithm, a necessary condition for a contraction to occur is that Tk < 10; we also show 
that the value of T eventually unavoidably increases as the algorithm proceeds. A contradiction thus 
arises because no combination of the permitted reflection and contraction steps allows the needed 
square root rate of decrease. 

The argument is complicated because the local coordinate frame changes at every step. Near the 
end of the proof (in Proposition 13. 15)) . we analyze sequences of no more than 14 steps, beginning 
with a contraction, in an advanced phase of the algorithm. Using a coordinate frame defined by 
a vertex of the first triangle in the sequence, we show that switching to a new coordinate system 
defined via the final triangle in the sequence makes only a small change in the flatness. This allows 
us to show that the flatness is inflated by a factor of at least 1.01 after at most 14 steps, which 
eventually means that a contraction cannot be taken. Since the triangle cannot reflect forever, our 
contradiction hypothesis must have been false; i.e., the method must converge. 

3.2 Notation 

Points in two dimensions are denoted by boldface lower-case letters, but a generic point is often 
called p, which is treated as a column vector and written as p = (x,y) T . We shall also often use 
an affinely transformed coordinate system with generic point denoted by p = (x,y) T . To stress the 
(x, y) coordinates of a specific point, say b, we write b = (b x , b y ) T . 

For future reference, we explicitly give the formulas for the reflection and contraction points in 
two dimensions: 

(3.1) Pr = p 1 +p 2 —Pz (2-d reflection); 

(3.2) p out = j{Pi + p 2 ) ~ \Pz (2-d outside contraction); 

(3.3) Pi n = \{p\ + p 2 ) + jPz (2-d inside contraction). 

Given the three vertices of a triangle, the reflection and contraction points depend only on which 
(one) vertex is labeled as "worst" . 

3.3 A changing local coordinate system 

The type of move at each RNM iteration is governed by a discrete decision, based on comparing 
values of /. Hcuristically, for a very small triangle near a point b, the result of the comparison 
is usually unchanged if we replace / by its dcgree-2 Taylor polynomial centered at b. If b is a 
nonminimizing point, then we can simplify the function further by making an affine transformation 
into a new coordinate system p = (x, y) (depending on b) in which the Taylor polynomial has the 
form 

constant + y + ^x 2 . 
This motivates the following lemma, which is a version of Taylor's theorem. 



3.3 A changing local coordinate system 
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Lemma 3.1. (Definition of local coordinate frame.) Let / G T . Given a point b and a 
nonsingular 2x2 matrix M , we may define an afEne transformation 

(3.4) p = M~ 1 (p-b) 

(with inverse map p = Mp + b). 

(i) For each point b that is not the minimizer of f, there exists a unique M with det M > 
such that when the function f of p = (x, y) T is re-expressed in the new coordinate system 
p = (x, y) T above, the result has the form 

(3.5) f(p) = f(b)+y+\x 2 +r(x,y), 
where r is an error term satisfying 

(3.6) r(x, y) = ^ay 2 + o(max(|£| 2 , |y| 2 )), 
as (5;, y) — > (i.e., as (x, y) T — s- b), for some a > 0. 

(ii) The function r in Q satisfies dr/dx = o(max(|5;|, |y|)) and dr/dy = o(|5;|) + 0(|y|), and the rate 
at which the o(-) terms approach zero and the bounds implied by O(-) can be made uniform 
for b in any compact set not containing the minimizer of f. 

(iii) As b varies over a compact set not containing the minimizer of f, the matrices M and M 
are bounded in norm and uniformly continuous. 

Proof. Let g = V/(b) and H — V 2 /(b) denote, respectively, the gradient and Hessian matrix of 
/ at b. Since / is strictly convex, its gradient can vanish only at the unique minimizer, so g =/= 0. 
Because / is twice-continuously diffcrcntiable, we can expand it in Taylor series around b: 

(3.7) f(p) = /(b) + g T (p -b) + |(p - b fH(p -b) + o(\\p - b|| 2 ) 

(3.8) = f(b) + g T Alp + \p T M T HMp + o(\\p - b\\ 2 ). 
The Taylor expansion (|3.8I) has the desired form if 

g T M =(01) and M T HM = ^ q ° j • 
for some a > 0. In terms of the columns mi and m-2 of M, these conditions say 

rrt rrt rri rji 

g mi = 0, g m-i = 1, m\ Hmi = 1, m{ Hm-i = 0, 

and then we may set a := mjHm,2, which will be positive since H is positive definite and since the 
conditions above force m,2 to be nonzero. 

Since g ^ 0, the condition g T m\ = says that mi is a multiple of the vector g obtained by 
rotating g by 90° clockwise: mi = £ig for some £i. The condition mjHmi = 1 implies that 
mi ^ 0. The condition mjHm2 = says that iJm.2 is a multiple of g. Since H is positive definite, 
H is nonsingular, so the equation Hw = g has the unique solution w = H~ 1 g, and then m-2 = 
for some £2- The normalizations g T rri2 — 1 and mjHm 1 = 1 are equivalent to 

(3.9) £2 = —^r- = 1 and £ 2 - 



g T w w T Hw g T Hg 

the denominators are positive since H is positive definite and w and g are nonzero. These conditions 
determine M uniquely up to the choice of sign of its first column, i.e., the sign of £1, but we have 
not yet imposed the condition detM > 0. We claim that it is the positive choice of £1 that makes 
det M > 0: since mi and m.2 are then positive multiples of g and w, respectively, the condition 
detM > is equivalent to g T w > 0, or equivalently, w T H T w > 0, which is true since the matrix 
H T = H is positive definite. This proves (0). 

Since / is twice-continuously diffcrcntiable, g and H vary continuously as b varies within a 
compact set not containing the minimizer of /. Hence M and M _1 vary continuously as well. This 
proves §u§ and dm}. □ 



8 



3 CONVERGENCE 



Remark 3.2. If H is positive semidefinite and singular, then the equation _ffw = g continues to 
have a solution provided that g G rangc(-ff), as in the McKinnon example (jl.lj) . But in this case, 
Hg = 0, so Hmi = 0, which contradicts mjHm 1 = 1, and no matrix M exists. 

Remark 3.3. As b approaches the minimizer of /, we have g — > 0, and the formulas obtained in the 
proof of Lemma 13.11 show that m-i remains bounded while and the value of a "blow up" , so M 
becomes unbounded in norm with an increasing condition number. 

The local coordinate frame defined in Lemma 13.11 depends on the base point b, the gradient 
vector g, and the Hessian matrix H. In the rest of this section, we use S'(b) (with a nonminimizing 
point b as argument) to denote the local coordinate frame with base point b. In the context of a 
sequence of RNM iterations, ^fc (or $(Ak), with a subscripted RNM triangle as argument) will mean 
the coordinate frame defined with a specified base point in RNM triangle A&. 



3.3.1 Width, height, area, and flatness. 

This section collects some results about transformed RNM triangles. 

Definition 3.4. (Width, height, and flatness.) Let f £ J-, and let A denote a nondegenerate 
triangle that lies in a compact set Q not containing the minimizer of /. Assume that we are given 
a base point b in Q, along with the coordinate frame defined at b as in Lemma |3. II 

• The (transformed) width of A, denoted by w(A), is the maximum absolute value of the dif- 
ference in x-coordinates of two vertices of A; 

• The (transformed) height, denoted by h(A), is the maximum absolute value of the difference 
of y-coordinates of two vertices of A; 

• The flatness of A, denoted by T(A), is 
(3-10) L(A) := 

w(A) J 

where A(A) is the (positive) area of A measured in the transformed coordinates. 
The argument A may be omitted when it is obvious. 

Lemma 3.5. (Effects of a reflection) The (transformed) height and width of an RNM triangle 
are the same as those of its reflection, if the same base point is used to define the local coordinate 
frame for both triangles. 

Proof. The new triangle is a 180° rotation of the old triangle. □ 

The next lemma bounds the change in three quantities arising from small changes in the base 
point used for the local coordinate frames. In (|m)) . we need a hypothesis on the width and height 
since for a tall thin triangle, a slight rotation can affect its flatness dramatically. 

Lemma 3.6. (Consequences of close base points.) Assume that f € T and that Q is a 
compact set that does not contain the minimizer of f. Let b\ and 62 denote two points in Q, and A 
denote an RNM triangle contained in Q. For i € {1, 2}, let Wi, hi, and T, be the transformed width, 
height, and flatness of A measured in the local coordinate frame S'(bj) associated with bi, and let 
Mi be the matrix of Lemma \3. 1 1 associated with $(bi). 

(i) Given e > 0, there exists S > (independent of bi and b%) such that if ||&2 — &i|| < 5, then 

\\M 2 M^-I\\<e. 

(ii) Given e > 0, there exists 5 > (independent ofbi, b 2 , and A) such that if\\bi — &2II < 5, then 
(3.11) (l-e)2 1 <I 2 <(l + e)Ii. 



3.4 The contradiction hypothesis and the limiting level set 
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(iii) Given C, e > 0, there is 8 > (independent of &i, 62, &nd A) such that if \\bi — f>2 1 1 < 6 and 
w\ > Chi, then 

(3.12) (i-e)ri <r 2 < (l + e)ri. 

Proof. 

(i) We have 

\\M 2 M^ - I\\ = ||M 2 (M 1 - 1 - M 2 X )\\ < ||M 2 || HMf 1 - M 2 \ 
By Lemma [3.11pli|) . the first factor ||A/ 2 || is uniformly bounded, and M _1 is uniformly con- 
tinuous as a function of b £ Q, so the second factor HM-f 1 — M 2 -1 || can be made as small as 
desired by requiring ||&2 — &i|| to be small. 

(ii) Letting p 2 and p 1 denote the transformed versions of a point p in Q using #(61) and $(b 2 ), 
we have 

(3.13) p 2 = M 2 1 M l p 1 + M^{bi - 6a), 

so that p 2 and p l are related by an affine transformation with matrix M 2 1 M 1 . When an affinc 
transformation with nonsingular matrix B is applied to the vertices of a triangle, the area of 
the transformed triangle is equal to the area of the original triangle multiplied by | det(£?)| [T3J 
page 144]. Applying this result to A gives 

(3.14) A 2 = A 1 \ dct(M~ 1 M 1 )|. 

Since | deti?| is a continuous function of B, the result follows from 

(iii) Because of (JTTJ) , it suffices to prove the analogous inequalities for width instead of flatness. 
Fixing two vertices of A, we let Vj denote the vector from one to the other measured in 5(6;), 
and let x(vj) denote the corresponding x-componcnt. Then |vj| < w± + hi = 0(wi), since 
wi > Chi. By 1(333]) . v 2 = M 2 _1 MiVi, so 

|x(v 2 ) - a:(vi)| < |v a - vi I = \{M 2 - x Mi - Z)vi| = 0{\\M 2 l Ah - I\\ ■ \wi\). 

This bounds the change in x-component of each vector of the triangle in passing from 3t&i) to 
5(62), and it follows that 

\w 2 'Wi\ = 0{\\M 2 1 M l -I\\-\wi\). 
Finally, by ©, \\M^ 1 M 1 -I\\ can be made arbitrarily small. □ 

3.4 The contradiction hypothesis and the limiting level set 

Our proof of Theorem ll.2l is by contradiction. Therefore we assume the following hypothesis for the 
rest of Section [3] and hope to obtain a contradiction. 

Hypothesis 1. Assume that the RNM algorithm is applied to f € J- and a nondegenerate initial 
triangle, and that it does not converge to the minimizer of f. 

We begin with a few easy consequences of Hypothesis 1. Let A^ be the RNM triangle at the 
start of the fc th iteration. Let A& be that triangle in the coordinate frame determined by any one of 
its vertices, and define its width Wk, height hk, and flatness Tk as in Definition 13.41 

Lemma 3.7. Assume Hypothesis 1. Then: 

(a) The diameter of A^ tends to 0. 

(b) The RNM triangles have at least one limit point p^ . 

(c) The function values at the vertices of Afc are greater than or equal to f(p'), and they tend to 
/(P + )- 



10 



3 CONVERGENCE 



(d) If Q is a neighborhood of the level set of p^ , then all the action of the algorithm is eventually 
inside Q. 

(e) We may choose Q to be a compact neighborhood not containing the minimizer of f; then there 
is a positive lower bound on the smallest eigenvalue of the Hessian in Q. 

(f) The diameter of tends to zero. 

(g) We have Wk — > and hk — > 0. 
Proof. 

(a) This follows from Lemma 12.51 even without Hypothesis 1. 

(b) Lemma 3.3 of |15) states that the best, next-worst, and worst function values in each successive 
triangle cannot increase, and that at least one of them must strictly decrease at each iteration. 
Because level sets are bounded, compactness guarantees that there is a limit point p'. 

(c) This follows from the monotonic decrease in function values, the shrinking of the diameter to 
zero, and the continuity of /. 

(d) Since the level sets are compact, there is a compact neighborhood I of f(p') such that / _1 (-f) 
is a compact set contained in the interior of Q. By (jej), the triangles are eventually contained in 

By (jlj), eventually even the rejected points tested in each iteration lie within / _1 (J). 

(e) The first statement follows since the minimizer is not on the level set of . The second statement 
follows from uniform continuity of the Hessian. 

(f) By Lemma [3~TlpIi|) . the distortion of the triangles is uniformly bounded. 

(g) This follows from @. 

□ 

For the rest of Section [3l we may assume that all our RNM triangles and test points lie in a 
compact set Q not containing the minimizer, as in Lemma [3/7JjeJ . In particular, the implied bounds 
in Lemma 13. II are uniform. 

3.5 Flattening of the RNM triangles 

Under Hypothesis 1, we now show that the transformed RNM triangles "flatten out" in the sense 
that the height becomes arbitrarily small relative to the width. The proof is again a proof by 
contradiction, showing that, unless the triangles flatten out, there must be a sequence of consecutive 
reflections in which the value of / at the reflection point is eventually less than f(p^), contradicting 
Lemma I3.7f [cj) . 

Lemma 3.8. (Flattening of RNM triangles.) Assume Hypothesis 1. Then lim/ c _ i>00 hk/wk = 0. 

Proof. Assume that the result of the lemma does not hold. In other words, within the rest of this 
proof, the following hypothesis is assumed: 

Hypothesis 2. There exists p > such that for arbitrarily large k we have hk/wk > P- 

We may assume also that p' is a limit point of the triangles for which hk/wk > p. 

Given e > 0, we define a downward-pointing sector of points (x,y) satisfying y < e — p\x\/10, 
and a truncated sector of points in the downward sector that also satisfy y > — e: see Figure [3] 

We now show that there exists e > (depending on / and p) such that, for any sufficiently 
advanced iteration fco for which h% / Wk > p, 
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Figure 3: The downward-pointing sector lies between the two finely dashed lines. The truncated sector 
consists of the shaded area, for p = 8 and e — 0.5. 

(a) Afe is contained in the truncated sector. 

(b) If A is any RNM triangle in the coordinates (x, y) of such that A is contained in the 
truncated sector and has (transformed) width w and height h satisfying h/w > p, then 

(i) One RNM iteration reflects A to a new triangle A'. (And A' has the same width and 
height as A, by Lemma [3. 51 ) 

(ii) The y-coordinate of the centroid of A' is at least 88/i/300 below that of A. 

(iii) A' is contained in the downward-pointing sector. 

(iv) If A' is not contained in the truncated sector, then the function value at the new vertex is 
less than f(p>). 

Starting from (jaj), applying (jb| repeatedly shows that the triangle in the (x,y) coordinates reflects 
downward until it exits the truncated sector through the bottom, at which point the function value 
at the exiting vertex is less than f(p'), which contradicts Lemma l3.7f [cj). Thus it remains to prove 
(jlj) and jb|. 

Proof of (jaj). 

By definition of $k 7 the point (0,0) is a vertex of Afc . For any given e > 0, if fco is sufficiently 
large, then Lemma [3~71j f| shows that the diameter of Afc is less than the distance from (0, 0) to the 
boundary of the truncated sector, so Afc is entirely contained in the truncated sector. 

Proof of ©. 

Suppose that A is contained in the truncated sector and satisfies h/w > p. Its vertices p i = 
(xi,yi) are the transforms of vertices p t of some A. We will use the notation fi = f(Pj) for any 
subscript i, and use similar abbreviations for other functions and coordinates. 

We show first that the difference in / values at any two vertices p i and Pj is within 3h/100 of 
the differences of their y-coordinates. Using p.5j) . we find that 



(3.15) fi - ^ =&-% + | (5? - x)) + r l -r 3 . 



The quantity \xf — xj] is bounded by 2u>|a;i| + w 2 . If e < i o 2 /4000, then \x\ < p/200 for any point in 
the truncated sector. By Lemma (j3.7p ([f|). if ko is large enough, then w < p/100. It follows that 

m wp h j~2 P w h 

(3.16) w\x\ < < and w z < ^— < , 

y ' 1 1 200 200 100 100 



!| < /t./50. On the other hand, ri — rj is the line integral of (dr / dx , dr / dy) over a path 
of length at most w + h = 0(h). Since dr/dx and dr/dy are 0(max(|x|, |y|)), the derivatives can 



so \x% - x 3 
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be made arbitrarily small on the truncated sector by choosing e small enough, and we may assume 
that \n -r. } \< ft/100. Now (j3"T5j) yields 

(3.17) fi-fi=Vi-Vi + t, with ICI < Yoo' 

(i) Let Pbo S t,Pnoxt,Pworst be the vertices of A ordered so that f bcst < / next < / wors t- Let 

Plow i Pmid j Phigh 

be the same vertices ordered so that yi ovf < 2/mid < jThigh- Recall that the 
reflection point p r = pbest + Pnext - Pworetjs accepted only if f t < /next- Equation (|3.17|l 
implies that ?7 bost , 2/ncxt, 2/worst are within 3ft/100 of yi ow , y mid , j/high, respectively. Hence the 
difference 

2/next J/r — J/worst ?/best 

is within 6ft/100 of j/high — jTiow = h. Applying (|3.17[) to the reflected triangle shows that 
/next > /n and the reflection point is accepted. 

(ii) The reflection decreases the y coordinate of the reflected vertex by 

U worst Vr — 2y WO rst ^best l/next; 

which is within 4(3ft/100) of 

2j/high — J/low — 2/mid > J/high — 2/low = ft- 

Consequently, y W orst — j7r > 88ft/100, and the centroid drops by at least 88ft/300. 

(hi) Furthermore, x r differs from £ wors t by no more than 2w, i.e., \x T \ < |x worst | + 2w. Since p wor st 
lies in the truncated sector and pw < ft, it follows that 

„ 88ft p _ 88ft p 2ft 

Vi + YnFrl - pworst - ^ + ^(Fworstl + 2w) < y WOIst ~ — + — |.T worst | + — 

~ /? l~ I 

^ y worst YqI*^ wo1s *I ^ ^' 

Thus, using the local coordinate frame gfc , the reflection point p r lies in the downward-pointing 
sector, and also lies in the truncated sector as long as y r > — e. 

(iv) Let b denote the base point of $k , so b = (0, 0). For p on the bottom edge of the truncated 
sector, we have y = — e and x = 0(e) as e — > (similar triangles). Relation (|3.5[) then implies 

(3.18) f(p) = f(b)-e + 0(e 2 ). 

Fixing e to be small enough that f(p) — f(b) < everywhere on the bottom edge, we can 
also fix a neighborhood U of the bottom edge and a neighborhood V of b = (0, 0) such that 
f(p) < /(k ) holds whenever p £ U and b £ V. 

If A' is not in the truncated sector, its new vertex p r is within w + ft of the bottom edge. If 
ko is sufficiently large to make w + ft small enough, it follows that p r 6 [/. 

By choice of p' (dchned immediately following Hypothesis 2), ko can be taken large enough 
that p^ is arbitrarily close to b in untrans formed coordinates. By Lemma 13. HffTTTj) . the matrix 
defining the local coordinate transformation is bounded and nonsingular. Hence we can make 
p^ arbitrarily close to (0,0) in transformed coordinates, and in particular we can guarantee 
that p lies in V. 

Thus /(p r ) < /(pt). □ 



Remark 3.9. An important consequence of Lcmma [3.8l is that w > ft for Afc measured in a coordinate 
frame associated to any one of its vertices, so that Lemma l3.6[fm|) can be applied with C = 1. 



3.6 The distance travelled during a sequence of reflections 



13 



3.6 The distance travelled during a sequence of reflections 

We now show that a sequence of valid reflections, starting from a sufficiently advanced iteration, 
does not move the triangle far. This result limits the possible change in flatness caused by moving 
the base point of the local coordinate system from the first to last triangle in the series of reflections. 

Lemma 3.10. Assume Hypothesis 1. Given k > 0, the following is true for any sufficiently large ko 
and any k > fco-' if all steps taken by the RNM algorithm from Afe to A& are reflections, then the 
distance between the transformed ccntroids of Ak and A& is less than k (where we use a coordinate 
frame whose base point is a vertex of Afc J. 

Proof. We work in the coordinates (x, y) of 3fc . It suffices to show that for sufficiently small 
positive e < k/2, if fco is sufficiently large and A is a later RNM triangle with centroid in the box 
{1^1 < \y\ < e}> then the next move does not reflect A so that its centroid exits the box. More 
precisely, for suitable e and ko, the idea is to prove: 

(a) The centroid cannot escape out the top of the box (i.e., the y-coodinate cannot increase beyond 
e) because the function value of the reflection point would exceed the function values of Afc 
(i.e., the function values near the center of the box). 

(b) The centroid cannot escape out the bottom because the function value there would be less than 
the limiting value f(p')- 

(c) The centroid cannot escape out either side, because the triangle A will be flat enough that the 
function values there are controlled mainly by the ^-coordinates, which force the triangle to 
reflect inward towards the line x = 0. 

The conditions on e and kg will be specified in the course of the proof. 
Proof of (juj) . 

We copy the argument used in proving fbj (iv) of Lemma 13.81 Let b be the base point used to 
define Jfco- F° r P along the top edge of the box, by definition y = e. Thus the same argument that 
proved (|3.18[) shows that 

f{p)=f(b)+e + 0{e 2 ) 1 

and that if e is sufficiently small, then there are neighborhoods U of the top edge and V of (0, 0) 

~ ~f ~ 

such that f(p) > f(b ) holds whenever p <E U and b £ V. If fco is sufficiently large, and A is the 

later triangle whose centroid is about to exit the box through the top, then by Lemma (|3.7[) (|fj). Afc 

and A are small enough that Afc C V and A C U, so the function values at vertices of A are greater 

than those for Afc , which is impossible since function values at vertices of successive RNM triangles 

are non-increasing. 

Proof of ®. 

This case is even closer to the proof of (jbj) (iv) in Lemma 13.81 That argument shows that if e is 
sufficiently small and fco is sufficiently large, then the function values at the vertices of a triangle 
A whose transformed centroid is about to exit through the bottom are strictly less than the value 
f(p ) (which is made arbitrarily close to f(b) by taking fco large). This contradicts Lemma r3.7tj cj). 

Proof of (jcj| . 

By symmetry, suppose that A reflects so that its centroid exits the box through the right side. 
By Lemma l3.Ttj gj) and Lemma 13.81 we may take fco large enough that 

(3.19) Wk < O.Ole and hk < 0.01eu5fc o . 

The width w and height h of A are the same as that of Afc . So all vertices of A satisfy 0.99e < x < 
l.Ole and — l.Ole <y< l.Ole. Let v = (x, y) and v = (x + S x ,y + S y ) be two such vertices. 
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We claim that if S x > w/10, then f(v') > f(v). By (|53j) . 
f(v') - f(v) = xS x + \5 2 x + 5 y + (r(x + 5x,y + 8y) - r(x, y)) 

> (0.99e)(w/10) + - h - (o(e)w + O(e)h) (by integrating Lemma ETH iD) 

> 0.099ew + - h - O.OOlew -h (if e is sufficiently small) 
= 0.098ew - 2h 

> (by the second inequality in (|3.19[0 . 

Now we can mimic part of the proof of (JbJ) in Lemma 13.81 but in the horizontal rather than the 
vertical direction. Let a? boat , a? noxt , a; worat be the x-coordinates of the vertices ordered by increasing 
function value, and let x lo{t , z mid , x r . ht be the same x-coordinates in increasing order. The previ- 
ous paragraph shows that within w/10 of x lBtt , x mld , x. bt , respectively. The 
reflection decreases the x coordinate of the reflected vertex by 

•^worst X v — 2x wors t ^best ^nexti 

which is within 4(u>/10) of 

2x r i g ht - Xleh - Xmid > bright ~ 5?lcft = W, 

so the x coordinate of the centroid decreases instead of increasing beyond e as hypothesized. □ 
3.7 Conditions at an advanced contraction 

Assuming Hypothesis 1, we next show that, whenever a contraction step is taken at a sufficiently 
advanced iteration fc, we have h k = 0(w1). We stress the assumption that the base of the local 
coordinate frame at iteration k lies inside A^. 

Lemma 3.11. Assume Hypothesis 1. If k is sufficiently large and a contraction step is taken at 
iteration k (meaning that the reflection point was not accepted), then the transformed height h and 
width w of Afe in a coordinate frame with base point inside must satisfy h < 10w 2 . 

Proof. Given a base point of the local coordinate frame in Afe, Lemma |3 . 1 1 shows that the difference 
in values of / at any two points p and v is 

(3-20) f(p) - /(v) =y p -y v + \{x 2 p - x%)+r{x p ,y p ) - r(x v ,y v ). 

For i £ {1, 2, 3}, let p i be the i th vertex of and let p i be its transform in the local coordinate 
frame. We assume throughout the proof that p 3 is the worst vertex. Let p r := p 1 + p 2 — p 3 be the 
reflect point, and let p r be its transform. 

The origin of the coordinate frame is inside A^, so \xi\ < w for i = 1,2,3. The RNM triangles 
are flattening out (Lemma 13. 8 [I . and the flatness docs not change very much when measured using 
the coordinate frame with a nearby base point (Lemma 13. 6tpH]) ). Hence, if k is large enough, h < w, 
so \yi\ < w for i = 1, 2, 3. Since p 3 is the worst vertex, f(p 3 ) — .f{P\) > 0. Substituting (|3.20p and 
rearranging yields 

(3.21) 2/3-2/1 > \{x{ -x 2 3 ) +r(x 1 ,y 1 ) -r{x 3 ,yz). 

Because \xi\ < w and \xj\ < w, we obtain \x 2 —x 2 \ < w 2 , so the inequality (|3.21|) implies 

(3-22) V3-V1 > +r(x 1 ,y 1 ) - r(x 3 ,y 3 ). 

Next we use the definition of the reflection point to obtain bounds in the other direction. A 
contraction occurs only when the reflection point is not accepted (see Step 3 of Algorithm RNM in 
Section^, which implies that f(p T ) — /(P2) ^ 0- Substituting p.20|) and rearranging yields 

(3.23) yi-]j2> 5(^2 - x 2 ) + r(x 2 ,y2) - r(x r ,y r ). 
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By definition of p r , we have y r — y 2 — y\ — y 3 . Substituting into the left-hand side of (|3.23|) yields 
(3-24) yi-y 3 > |(^2 - %r) + r(x 2 ,y 2 ) - r(x z ,y T ). 

We have \x2\ <w and \x r — x 2 \ — \xi — x 3 \ < w, so 

\X 2 — X^\ = \x 2 + X r \ ■ \x 2 — X T \ < 3w 2 , 

and substituting into (|3.24j) yields 

(3.25) yi~yz> ~\w 2 + r(x 2 ,y 2 ) - r(x Y ,y r ). 

If k is sufficiently large, we know from Lemmas 12.51 and 13.11 that, in the smallest box containing 
a transformed advanced RNM triangle and its reflection point, \dr/dx\ < w and \dr/dy\ < \. 
Consequently, 

(3.26) \r(x!,yi) - r(x 3 ,y 3 )\ < w\x 1 - x 3 \ + \\y x - y 3 \ < w 2 + - y 3 \ 
\r(x 2 ,y 2 ) - r(x T ,y x )\ < w\x 1 - x 3 \ + \\y x - y 3 \ < w 2 + - y 3 \. 

Substituting the equations p.26p into p.22p and p.25[) . respectively, we obtain 

(3.27) y 3 -yi > -\w 2 - \\y x -y 3 \ and y x - y 3 > -f w 2 - \\y x - y 3 \. 

These imply y 3 — yi > —3m; 2 and y x — y 3 > — 5u; 2 , so \y x — y 3 \ < 5w 2 . Our numbering of p x and p 2 
was arbitrary, so \y 2 — ^3 1 < 5w 2 too. These two inequalities imply h < 10w 2 . □ 

Remark 3.12. The lemma just proved applies to an RNM triangle not at an arbitrary iteration, but 
only at a sufficiently advanced iteration k. Even for large k, the condition h < 10w 2 is necessary 
but not sufficient to characterize an RNM triangle for which a contraction occurs. 

Figures H] and [5] illustrate two cases for the function ^x 2 + y + \y 2 . The worst vertex is at the 
origin in each figure. In Figure |U we have ft, = 1.2 x 10~ 6 and w = 2 x 10~ 4 , so h/w 2 = 30; as 
Lemma 13.111 would predict at an advanced iteration, the triangle reflects instead of contracting. In 
Figure El by contrast, h = 3 x 10 -8 and w = 2 x 10 -4 , so h/w 2 = | and an outside contraction is 
taken. The vertical scale in each figure is greatly compressed compared to the horizontal, and the 
vertical scale in Figure 2] differs from that in Figure [5] by two orders of magnitude. 




x 1 O * 



Figure 4: The contours of y + \x 2 + \y 2 are shown along with an RNM triangle with h/w 2 — 30. The 
reflection is accepted. 

Lemma 3.13. Under the assumptions of Lemma 13.111 if k is sufficiently large and a contraction 
step is taken at iteration k, then T). < 10, where is the flatness of Aj, as in Dchnition \3.4\ 

Proof. Let w, h, A be the width, height, and area of with respect to the coordinate frame 
associated by Lemma [3.11 to a vertex of A^. If k is sufficiently large, then Lemma [3.111 implies 
h < 10m; 2 . Hence 
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Figure 5: The contours of y + |5; 2 + hy 2 , are shown along with an RNM triangle with h/w 2 = |. The 
reflection step is not accepted, and an outside contraction is performed. Note the difference, by four orders 
of magnitude, between the horizontal and vertical scales. 



3.8 Eliminating the impossible: increasing flatness is unavoidable 

The final piece of the proof of Theorem 11.21 will show that, for sufficiently advanced iterations, the 
flatness of the RNM triangles must increase by a factor of at least 1.001 within a specified number of 
iterations following a contraction. To obtain this result, we begin by characterizing the structure of 
RNM vertices at sufficiently advanced iterations following a contraction, and then defining a related 
but simpler triangle. 



3.8.1 A simpler triangle. 

Assume that (i) there is a limit point pt of the RNM triangles that is not the minimizer of /, (ii) fco 
is sufficiently large, and (iii) iteration fco is a contraction. For the RNM triangle Afc , let 3i denote 
the coordinate frame whose base point is the vertex of Afc with the worst value of /: 

(3.28) base(S'i) = (p worst )fe . 

This first coordinate frame is used to identify p\ e it and Prighti the transformed vertices of Afc with 
leftmost and rightmost x coordinates. 

A second coordinate frame, $2, is defined next whose base point (measured in frame #1) is the 
midpoint of [pi oft , Pright]: 

(3.29) base(# 2 ) = |(Pieft + Pright)- 

Unless otherwise specified, the coordinate frame $2 is used throughout the remainder of this proof. 
The base points of $i and $2 will be arbitrarily close if fco is sufficiently large. 

We assume that fco is sufficiently large so that the RNM triangles have become tiny in diameter 
and flattened out fLemma l3.8p . The reason for defining is that we can choose a small 77 > such 
that the transformed three vertices of A^, measured in coordinate frame $2: may be expressed as 

(3.30) ao = ( ~ V 9 ] , 60 = [ *l I , and c - 



1 —wi] I \ it] I \ urj J 

where vertex ao corresponds to pi ft and vertex Cq to p r i g ht • 

Without loss of generality the value of s in (|3.30[) can be taken as nonnegative. The vertices ao 
and Co were leftmost and rightmost when measured in 3i; by Lemma l3.6l p]). the s in (|3.30p cannot 
be too much larger than 1. We assume that fco is large enough so that < s < 1.00001. 

Because of the form of the vertices in (|3.30[) and the bounds on s, the transformed width w 
of Afc (measured using coordinate frame #2) can be no larger than 2.00001?y. Iteration fc is, by 
assumption, a contraction, so it follows from Lemma [3 . 1 1 1 that the transformed height of Afc satisfies 
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h < 10w 2 , and hence h < 40.0005t7 2 . Since h is equal to the larger of 2|u|?y 2 or + |i|)?7 2 , it follows 
that |u| < 40.0005 and |i| < 40.0005 in (j3~30| . 

If A and A' are any two consecutive RNM triangles in which the same coordinate frame is used, 
the new vertex of A' is a linear combination of the vertices of A, with rational coefficients defined 
by the choice of worst vertex and the nature of the move. (See (|3.1[) - (|3.3[) .) Furthermore, the values 
of w and h in A and A' remain the same or decrease, and, if v is any vertex of A and v' is any 
vertex of A', then \x v i — x v \ < 2w and \y v > — y v \ < 2h. Thus, after £ > moves, we reach a triangle 
Afc _|_£ for which each transformed vertex v has the form 



(3.31) v=[ ' , where |A| < 1.00001 + 4.000021 and \fx\ < 40.0005(1 + 21). 




3.8.2 Rescaled inequalities associated with RNM moves. 

The next step is to make a rescaling of coordinates to define a triangle A.( that is related to Ak +e 
by the diagonal affine transformation diag(?7, rj 2 ). Let p = (Xrj, nrj 2 ) be a point in Ak +i measured 
in ^2- Then 

(3.32) p = ( \ ] corresponds to P = I ] (a point in A^), 

V m 2 J V » J 

where A and /j, satisfy the bounds (|3 . 3 1 1) . The flatness of A^, defined as area(A£)/(width(A^)) 3 , is 
equal to the flatness of &-k +i measured in coordinate frame $2- 

Assume now that £ < 20; the reason for this limit on I will emerge later in Proposition 13.151 
For vertex i of Ak +e, equation (|3.31|) shows that the coefficients in its transformed coordinates 
satisfy |A^| < 82 and < 3000. By (|3.5p . (|3.6j) . and (|3.31[) . once k is large enough to make o(n 2 ) 
sufficiently small, the difference in / values between vertices i and j is 

f(vi) - f(vj) = 7? 2 [(iA 2 + ^) - (iA 2 + (Mj)} + Kw^ 2 A,) - r( m , v 2 \ 2 ) 

(3-33) = r] 2 [^\ 2 + fM)-(^ 2 +^)}+o(r 1 2 ). 

Let ip denote the simple quadratic function 

(3.34) ^(A,/i) := iA 2 + fi. 

Then (|3.33[) shows that, if kg is large enough, the following relationships hold between / at vertices 
of Ak +e and ip at vertices of A^: 

(3.35) f(vi) > f(vj) implies V(Ai,/ii) > - 10 -6 , 

where 10 -6 is not magical, but simply a number small enough so our subsequent results follow. 

Example 3.14. For illustration, let £ = 0. Based on (|3.30[) . the vertices of Ao are given by 

(3.36) A = ( j , B = ( S t j , and C = ( ^ j , 

and suppose that ao is the worst transformed vertex of Afc , i.e. that 

/(a )>/(bo) and f(a ) > /(c ). 
Application of p.35| gives ^(-1, -u) > ip(s, t) - 10 -6 and tp(-l, -u) > ip(l, u) - 10~ 6 , i.e. 
\ - u > \s 2 + t - 10 -6 and 10~ 6 > 2u (a simplification ot\-u>\+u - 10 -6 ). 

In this way, inequalities characterizing the transformed vertices (|3.31[) of Ak +e when applying 
the RNM algorithm with function / can be derived in terms of vertices of the simpler triangle A^ 
when applying the RNM algorithm to the function ?/>(A,m), except that both possible outcomes of 
a comparison must be allowed if the two values of i[> are within 10 -6 . The importance of (|3.35p is 
that, for £ < 20, a possible sequence of RNM moves specifying the move type and worst vertex leads 
to a set of algebraic inequalities in s, t, and u. 
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3.9 Flatness must increase after no more than 14 steps 

In the remainder of this section, we consider the transformed width, area, and flatness of a sequence 
of RNM triangles, Afc , . . . , Ak +£, defined using a coordinate frame whose base point is in Afc . 
Accordingly, notation is needed that separately identifies the RNM triangle being measured and the 
relevant coordinate frame. The value r^ 1 "* will denote the flatness of RNM triangle Afc measured in 
3*1 of (|3.28[) , and will denote the flatness of A& measured in (|3 . 29[) . with similar notation 
for w and A. Since the base points of coordinate frames 3i and $2 arc in Afc , an essential point is 
that, when k > kg, the triangle containing the base point of the coordinate frame is different from 
the triangle being measured. 

The result in the following proposition was found using symbolic computation software. 

Proposition 3.15. Assume Hypothesis 1. If ko is sufficiently large and a contraction step is taken 

(2) (2) 

at iteration ko, then there exists £ with 1 < I < 14 such that Y y k ' +e > 1.01 1^ . 

Before giving the proof, we sketch the basic idea. As just described in Section 13.8.21 we are 
in a situation where two properties apply: (1) the transformed objective function at the scaled 
point (A, fi) T can be very well approximated by the quadratic function tp(X, fi) := iA 2 + fi in (|3.34|) . 
and (2) the RNM move sequences of interest can be analyzed by beginning with an initial simplified 
(scaled) triangle whose vertices (see (|3.36p ) involve bounded scalars (s, t, u) that lie in a compact set. 
Under these conditions, the proof explains how algebraic constraints can be derived that characterize 
geometrically valid sequences of RNM moves. Further algebraic constraints involving s can also be 
defined that must be satisfied when the flatness increases by a factor of no more than 1.01. 

In principle, one could establish the result of the proposition by numerically checking flatness 
for all geometrically valid RNM move sequences beginning with the simplified triangle, but this 
approach is complicated, structureless, and too time-consuming for numerical calculation. Instead, 
we used Mathematica™ 7.0 to construct symbolic inequalities representing RNM move sequences 
such that 

• s, t, and u are suitably bounded, 

• the geometric condition (|3.35p for a valid RNM move applies, and 

• the flatness increases by a factor of Jess than or equal to 1.01. 

Proof of Proposition \3.15\ The flatness is not changed by a reflection step as long as the same 
coordinate frame is retained. Assuming that fco is sufficiently large and that the move taken during 
iteration k is a contraction, we wish to show that there is an index I satisfying 1 < £ < 14 such 
that the flatness T of the RNM triangle Afc 0+ ^, measured in coordinate frame $2, must be a factor 
of at least 1.01 larger than the flatness of Afc , i.e., that 



Let us prove (|3.37[) directly for I = 1 when Aq of (|3.36p is the worst vertex of Ao and an inside 
contraction occurs. In this case, the next triangle Ai has vertices 



where the first vertex A has been replaced. We have two cases: 

• IfO < s < 1, then{y(A ) = 2 andw(Ai) = §-±s < f, which implies that w(A )/w(A 1 ) > f . 

• If 1 < s < 1.00001, then w(A ) > 2 and w(Ai) = |s + ±, so that w(Ai) < 1.0000075 and 
w(A )/w(A 1 ) > 1.9999. 



(3.37) 




(3.38) 




3.9 Flatness must increase after no more than 14 steps 
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For all s satisfying < s < 1.00001, it follows that w(Aq)/w(Ai) > |, and hence that 




The area of Ai is half the area of Aq. Hence the ratio of the flatnesses of Ai and Aq satisfies 



The same argument applies when Ai is the result of an outside contraction in which vertex Aq is 
the worst. 

But when the sequence of moves begins with a contraction in which vertex Bq or Co is worst, we 
must break into further cases, and the analysis becomes too complicated to do by hand. To examine 
such sequences of RNM moves, we use a Mathcmatica program that generates inequalities involving 
vertices of A^ and the function -0 of (|3.34|) . as described in Section f3. 8. 21 

Any sequence of RNM moves (where a move is specified by the worst vertex and the type of 
move) starting with triangle Afc gives rise to a set of algebraic inequalities in s, t, and u. The z th 
of these latter inequalities has one of the forms 4>i(s) + v(t + cj,m > 9i or 4>i(s) + v(t + > 9i, 
where </>j(s) is a quadratic polynomial in s with rational coefficients, and z/j, Wi, and &i are rational 
constants. 

The next step is to determine whether there are acceptable values of s, t, and u for which these 
inequalities are satisfied. To do so, we begin by treating s as constant (temporarily) and considering 
the feasibility of a system of linear inequalities in t and u, namely the system Nz > d, where 
z = (t u) T , the i th row of N is (vi uJi), and d, = ftj — 4>i{s). A variant of Farkas' lemma j26j page 
89] states that the system of linear inequalities Nz > d is feasible if and only if j T d < for every 
vector 7 satisfying 7 > and A^ T 7 = 0. If the only nonnegative vector 7 satisfying N T ^ = is 
7 = 0, then Nz > d is feasible for any d. 

The existence (or not) of a nonnegative nonzero 7 in the null space of N T can be determined 
symbolically by noting that the system Nz > d is feasible if and only if it is solvable for every subset 
of three rows of N. Let N denote the 3x2 matrix consisting of three specified rows of N, with a 
similar meaning for d. To determine the feasibility of Nz > d, we first find a vector 7 such that 



If N has rank 2, then 7 is unique (up to a scale factor) and we can write N T (or a column 
permutation) so that the leftmost 2x2 submatrix B is nonsingular. Then, with 



where the components of B 1 and h are rational numbers. If (with appropriate scaling) 7 > with 
at least one positive component, then N T z > d is solvable if and only if 7 T d < 0. If the components 
of 7 do not have the same sign, N T z > d is solvable for any d. 

If N has rank one, its three rows must be scalar multiples of the same vector, i.e., the i th row is 
{fiii>\ PiCJi), and the null vectors of N T are linear combinations of (/?2, 0) T , (0, /?3, — p2) T , and 



Since the components of d are quadratic polynomials in s and the components of each 7 are 
rational numbers, the conditions for feasibility of Nz > d (e.g., the conjunction of conditions that 
7 T ii < for each set of three rows of N) can be expressed as a Boolean combination of quadratic 
inequalities in s with rational coefficients that, for a given value of s, evaluates to "True" if and only 
if there exist t and u such that these inequalities arc satisfied. 

To verify the result of the proposition for a given sequence of I RNM moves applied to Ao , we 
need to compute the flatness of A^ , which is, by construction, equal to the flatness of Ak +e measured 
in coordinate frame $2; sec <|3 - 32f> - We can directly calculate the ratio of the area of A^ to the area 
of Ao by using the number of contractions in the move sequence, since each contraction multiplies 
the area by \. The width of A e can be obtained using inequalities and linear polynomials in s, since 




7 = 0. 




( / 3 3 ,0,-/3 1 ) T 
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the width is determined by the largest and smallest x coordinates, which are linear polynomials in 
s. Consequently, the condition that the flatness for each triangle in the sequence is less than 1.01 
times the original flatness can be expressed as a Boolean combination of (at most cubic) polynomial 
inequalities in s, where s is constrained to satisfy < s < 1.00001. 

To determine whether there are allowable values of s for which a specified sequence of RNM 
moves is possible, observe that a Boolean combination of polynomial inequalities in s will evaluate 
to "True" for s in a certain union of intervals that can be computed as follows. We first find the 
values of s that are solutions of the polynomial equations obtained by replacing any inequalities by 
equalities. Then, between each adjacent pair of solutions, we choose a test value (e.g., the midpoint) 
and check whether the associated inequality evaluates to "True" on that interval. 

The computation time can be cut in half by considering only sequences that begin with an inside 
contraction, for the following reason. The outside contraction point for an original triangle A with 
vertices p l5 p 2 , and p 3 is equal to the inside contraction point for a triangle, denoted by A', whose 
worst vertex p 3 is the reflection point p r of A. With exact computation, the conditions for an outside 
contraction of A differ from those for an inside contraction of A' if equality holds in some of the 
comparisons. In particular, if f(p 3 ) > f{p T ) > /(P2)i then A will undergo an outside contraction 
and A' will undergo an inside contraction; but if f(p 3 ) = f(Pi), then both A' and A will undergo 
inside contractions. Since our inequalities allow for a small error in comparisons, this difference will 
not change the result, and we may assume that the RNM move at Afc is an inside contraction. 

Finally, the definition of the RNM algorithm imposes further constraints on valid move patterns. 
For example, if a reflection occurs, the reflection point must be strictly better than the second- 
worst vertex, so this reflection point cannot be the worst point in the new triangle. Such sequences 
(impossible in the RNM algorithm) would be permitted by the small error allowed in the inequalities, 
so they are explicitly disallowed in the Mathematica code. 

Putting all this together, a program can test each sequence of valid operations that begins with 
an inside contraction to determine whether there exists an initial triangle for which ratio of the 
flatnesses, measured in $2, is less than 1.01. The results of this computation show that, within no 
more than 14 RNM moves following a contraction, a triangle is always reached for which the ratio of 
the flatnesses, measured in the second coordinate frame S2, is at least 1.01. We stress that the count 
of 14 moves includes a mixture of reflections and both forms of contraction. Details of these move 
sequences can be found in the appendix. There we list the s-values and the associated sequences of 
14 or fewer RNM moves for which the ratio of the flatnesses remains less than 1.01. □ 

Proposition 13.151 used $2, but its analogue for 3i follows almost immediately with a slightly 
smaller constant in place of 1.01. 

Lemma 3.16. Under the assumptions of Proposition ^. 1 5\ there exists £ with 1 < £ < 14 such that 

r£ } +/ > 1.001 r£>. 

Proof. The base point of #1 is the worst point of Afc ; the base point of #2 is the midpoint of the edge 
of Afe joining the two vertices whose x coordinates are leftmost and rightmost when measured in $i . 
By choosing k to be large enough, the two base points can be made arbitrarily close. Lemma l3.6[jm]) 
with e = 0.0001 shows that for large enough fco, the flatnesses of triangles Afc and Ak +t measured 
in coordinate frames 3i and ^2 satisfy 

(3.39) 0.9999 r£> < rg> < 1.0001 r£> and 0.9999 if^ < T$ +i < 1.0001 if^. 

Now, for £ as in Proposition ^. 15[ 

r^ +i > 0.9999 T<g +t 



> 0.9999(1.01)^ (by Proposition GETS) 

> 0.9999(1. 01)(0.9999)P^ 1 o ) 



> 1.001 r^. □ 



3.10 Completion of the proof 
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3.10 Completion of the proof 

The main result of this paper is the following theorem (called Theorem II. 21 in Section [1]). 

Theorem 3.17. If the RNM algorithm is applied to a function f 6 F, starting from any nondc- 
generate triangle, then the algorithm converges to the unique minimizcr of f. 

Proof. In this proof, Tj (Aj) denotes the flatness of RNM triangle A* measured in a coordinate frame 
3j whose base point is the worst vertex of triangle Aj . 

Given a small positive number k, let fco be sufficiently large (we will specify how small and how 
large as we go along) . As mentioned in Section 13. H the RNM triangle must contract infinitely 
often, so we may increase fco to assume that Ak contracts. Lemma 13.161 shows that the flatness 
measured in 3fc increases by a factor of 1.001 in at most 14 RNM moves; i.e., there exists fci with 
ko < k\ < ko + 14 such that 

(3.40) r fco (A fel ) > 1.001 r feo (A fco ). 

We now switch coordinate frames on the left hand side: Lemma r3.6ljm|) and Remark 13.91 show that 
the flatness of A^ in Sfa is close to its flatness in 3tc ■ m particular, if ko is sufficiently large, then 

(3.41) T fcl (A fcl )> 0.9999 r fco (A fcl ). 

Let > k\ be the first iteration after (or equal to) k\ such that Afc 2 contracts. Lemma 13.101 shows 
that if ko is sufficiently large, then from iteration k\ to the beginning of iteration k2, the distance 
travelled by the centroid, measured in ^fei , is l css than n. During those iterations, the RNM triangle 
retains its shape and hence its flatness, as measured in 3fci ; that is, 

(3.42) r fel (A fe2 )=r fel (A fel ). 

If k was small enough, Lemma I3.6[fm|) and Remark 13.91 again imply 

(3.43) r fc2 (A fc2 ) > 0.9999 T fcl (A fc2 ). 
Combining (j!T40j) . (|5Hjt . (f3T42|) . and (f^43|) yields 

r fe2 (A fc2 ) > (0.9999) 2 (1.001)r fco (A fco ) > 1.0007 r fco (A fco ). 

If fco is sufficiently large, then repeating the process that led from fco to &2 defines fco < &2 < 
k& < ■ ■ ■ such that 

r fc2 „(A fc2 j>(i.ooo7)"r fco (A fco ) 

for all n: to know that the same lower bound on fco works at every stage, we use that in Lemma [3.6[jm)) 
the number S is independent of b\, 62, and A. Now, if n is sufficiently large, then 

r fe2 „(A fe2 j > 10. 

But Afc 2n contracts, so this contradicts Lemma [3.131 

Hence the assumption made at the beginning of our long chain of results, Hypothesis 1, must be 
wrong. In other words, the RNM algorithm does converge to the minimizer of /. □ 

4 Concluding Remarks 

4.1 Why do the McKinnon examples fail? 

For general interest, we briefly revisit the smoothest McKinnon counterexample (|1.1[) . which consists 
of a twice-continuously diffcrentiable function / and a specific starting triangle for which the RNM 
algorithm converges to a nonminimizing point (with nonzero gradient). The Hessian matrix is 
positive semidefinite and singular at the limit point, but positive definite everywhere else. Thus 
all the assumptions in our convergence theorem arc satisfied except for positive-definiteness of the 
Hessian, which fails at one point. Hypothesis [1] is valid for this example, and it is enlightening to 
examine where the proof by contradiction fails. 
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The McKinnon iterates do satisfy several of the intermediate lemmas in our proof: the RNM 
triangles not only flatten out (Lemma 13. 8p , but they do so more rapidly than the rate proved in 
Lemma !3.1lF However, an essential reduction step, Lemma 13.61 fails to hold for the McKinnon 
example, as discussed below. 

Positive-dcfinitcncss of the Hessian plays a crucial role in our proof by contradiction because it 
allows us to uniformly approximate the objective function close to the limit point pt by its degree-2 
Taylor polynomial. Applying a well-defined change of variables, the function ^x 2 + y for a simple 
triangle can then be taken as a surrogate, and we can essentially reduce the problem to studying the 
RNM algorithm for the objective function ^x 2 +y near the non-optimal point (0, 0). In the McKinnon 
example (jl.lj) . however, the objective function near the limit point (0,0) cannot be (uniformly) well 
approximated by ^x 2 + y, even after a change of variable. Although the Hessian of the McKinnon 
function / remains positive definite at base points in as k — > oo, it becomes increasingly close 
to singular, in such a way that ever-smaller changes in the base point will eventually not satisfy the 
closeness conditions of Lemma 13.61 In fact, the actual shape of the McKinnon objective function 
allows a sequence of RNM moves that are forbidden for ^x 2 + y near the non-optimal point (0,0). 
namely an infinite sequence of inside contractions with the best vertex never replaced. In dynamical 
terms, the McKinnon objective function allows symbolic dynamics forbidden for ^x 2 + y near (0, 0), 
and these symbolic dynamics evade the contradiction in our argument. 



4.2 An instance of RNM convergence 

Most of this paper has been devoted to analysis of situations that we subsequently show cannot 
occur; this is the nature of arguments by contradiction. For contrast, we present one example where 
the RNM algorithm will converge, as we have proved, on the strictly convex quadratic function 

f(x, y) = 2x 2 + 3y 2 + xy - 3x + by, 

whose minimizcr is x* = (1, — 1) T . Using starting vertices (0, 0.5) T , (0.25, — 0.75) T , and (—0.8, 0) T , af- 
ter 20 RNM iterations the best vertex is (0.997986, -1.00128) T , and the RNM triangles are obviously 
converging to the solution. The first nine iterations are depicted in Figure [6l 



0.5 





Figure 6: Convergence of the RNM algorithm on a strictly convex quadratic function. 



4.3 Significance of the results in this paper 

This paper began by noting that very little is known about the theoretical properties of the original 
Neldcr Mead method, despite 45 years of practice. It is fair to say that proving convergence for an 

4 As k — ¥ oo, the McKinnon triangles satisfy £S iv® for 9 = | A2 1 ( 1 + IA2D/A1 ;=3 3, where Ai,2 = (1 ± v / 33)/8. 



4.3 Significance of the results in this paper 
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RNM algorithm in two dimensions on a restricted class of functions adds only a little more to this 
knowledge. This contribution seems of interest, however, because of the lack of other results despite 
determined efforts, and the introduction of dynamical systems methods to the analysis. 

Our analysis applies only to a simplified ("small step") version of the original Nelder-Mead 
method which excludes expansion steps. We have observed that in thousands of computational 
experiments with functions defined in M. n (n > 2) in which the Nelder-Mead method converges to a 
minimizer, expansion steps are almost never taken in the neighborhood of the optimum. Expansion 
steps are typically taken early on, forming part of the "adaptation to the local contours" that 
constituted the motivation for Nelder and Mead when they originally conceived the algorithm |20) . 
Thus the RNM algorithm appears to represent, to a large extent, the behavior of the original method 
near the solution. In this direction, it would be valuable if these empirical observations could be 
rigorously justified under a well-defined set of conditions. The observed good performance of the 
Nelder Mead method on many real-world problems remains a puzzle. 

This paper applies dynamical systems methods to the analysis of the RNM algorithm. The use 
of such ideas in the proofs, particularly that of a (rescaled) local coordinate frame in Section f3. 8. 21 
may also be useful in other contexts where it is valuable to connect the geometry of a simplex with 
the contours of the objective function. The evolving geometric figures of the algorithm remain one 
of the intuitive appeals of the original Nelder-Mead method, leading to the nickname of "amoeba 
method" [23]. There may well be other applications, but the latest direct search methods tend to 
exhibit a less clear connection with geometry. 

Finally, our analysis for the RNM algorithm relies in part on the fact that the volume of the 
RNM simplex is non-increasing at every iteration, thereby avoiding the difficulties associated with 
expansion steps. Consequently, McKinnon's question remains open: does the original Nelder-Mead 
algorithm, including expansion steps, always converge for the function x 2 + y 2 , or more generally 
for a class of functions like those treated in Theorem 13.1 Tl ? We hope that further development of 
the dynamical systems approach could lead to progress on this question. 

Appendix: Computation for Proposition 13.151 

This appendix provides details of the symbolic computation performed to prove Proposition 13 . 1 51 
We regard the coding of moves as a form of symbolic dynamics for the RNM iteration. Moves 
are represented as follows: 1, 2, and 3 denote reflections with, respectively, vertex A, B, or C 
of (|3.36[) taken as the worst vertex, i.e. replaced during the move. Similarly, 4, 5, and 6 denote 
inside contractions, and 7, 8, 9 denote outside contractions with worst vertex A, £?, C, respectively. 

We describe a sequence of move numbers as possible for a given s £ [0,1.00001] if there exist 
t,u£ [-40.0005,40.0005] such that for the triangle described by (s,t,u), 

(i) the variables s, t, u satisfy the inequality implied by p.35|) for each RNM move, 

(ii) the flatness after each step is less than or equal to 1.01 times the original flatness, and 

(iii) no reflection undoes an immediately preceding reflection. 

Remark 4.1. Because (|3.35|) involves a relaxation of 10~ 6 , a sequence characterized as "possible" 
using the first two properties listed above could be impossible for the RNM algorithm in exact 
arithmetic. This is why the third condition explicitly prohibits sequences in which a reflection 
undoes the previous move, something that can never happen in the RNM algorithm. 

In the proof of Proposition 13.151 we described a symbolic algorithm for computing all possible 
sequences beginning with an inside contraction. The Mathematica output below lists all these 
sequences. 

{5} possible for s in {{0.999999, 1.00001}} 
{5, 6} possible for s in {{0.999999, 1.00001}} 
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{6} possible for s in {{0.582145, l.» 



{6, 


2} 


possible 


for s in {{0.582145, 0.737035}} 


{6, 


2, 


1} 


possible 


for s in {{0.582145, 0.695708}} 


{6, 


2, 


1, 


3} 


possible for s in {{0.582145, 0.654949}} 


{6, 


2, 


1, 


3, 


2} 


possible for s in {{0.582145, 0.654949}} 


{6, 


2, 


1, 


3, 


6} 


possible for s in {{0.582145, 0.654949}} 


{6, 


2, 


1, 


3, 


6, 


2} 


possible for s in {{0.616769, 0.654949}} 


{6, 


2, 


1, 


3, 


6, 


2, 


5} possible for s in {{0.616769, 0.64706}} 


{6, 


2, 


1, 


3, 


6, 


8} 


possible for s in {{0.582145, 0.64706}} 


{6, 


2, 


1, 


3, 


6, 


8, 


4} possible for s in {{0.582145, 0.623495}} 


{6, 


2, 


1, 


3, 


9} 


possible for s in {{0.582145, 0.644579}} 


{6, 


2, 


1, 


6} 


possible for s in {{0.582145, 0.695708}} 


{6, 


2, 


1, 


9} 


possible for s in {{0.582145, 0.673138}} 


{6, 


2, 


1, 


9, 


2} 


possible for s in {{0.616769, 0.673138}} 


{6, 


2, 


1, 


9, 


2, 


5} 


possible for s in {{0.616769, 0.64706}} 


{6, 


2, 


1, 


9, 


8} 


possible for s in {{0.582145, 0.64706}} 


{6, 


2, 


1, 


9, 


8, 


4} 


possible for s in {{0.582145, 0.623495}} 


{6, 


2, 


5} 


possible 


for s in {{0.582145, 0.737035}} 


{6, 


2, 


5, 


4} 


possible for s in {{0.582145, 0.695708}} 


{6, 


2, 


5, 


7} 


possible for s in {{0.582145, 0.681931}} 


{6, 


2, 


5, 


7, 


6} 


possible for s in {{0.582145, 0.635866}} 


{6, 


2, 


5, 


7, 


9} 


possible for s in {{0.582145, 0.681931}} 


{6, 


2, 


5, 


7, 


9, 


5} 


possible for s in {{0.582145, 0.679967}} 


{6, 


2, 


5, 


7, 


9, 


8} 


possible for s in {{0.582145, 0.663254}} 


{6, 


2, 


5, 


7, 


9, 


8, 


4} possible for s in {{0.582145, 0.646912}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7} possible for s in {{0.582145, 0.663254}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 6} possible for s in {{0.582145, 0.663254}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 6, 5} possible for s in {{0.589537, 0.663254}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 6, 5, 1} possible for s in {{0.589537, 0.635373}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9} possible for s in {{0.582145, 0.65445}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 5} possible for s in {{0.582145, 0.651784}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 5, 4} possible for s in {{0.582145, 0.651784}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 5, 4, 3} possible for s in {{0.582145, 0.651784}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 8} possible for s in {{0.597869, 0.65445}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 8, 4} possible for s in {{0.597869, 0.65445}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 8, 4, 6} possible for s in {{0.597869, 0.65445}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 8, 4, 6, 2} possible for s in {{0.597869, 0.654004}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 8, 4, 6, 2, 5} possible for s in {{0.64094, 0.654004}} 


{6, 


2, 


5, 


7, 


9, 


8, 


7, 9, 8, 4, 6, 8} possible for s in {{0.64094, 0.65445}} 


{6, 


2, 


8} 


possible 


for s in {{0.582145, 0.614711}} 


{6, 


5} 


possible 


for s in {{0.582145, 1.}} 


{6, 


8} 


possible 


for s in {{0.582145, 0.853944}} 


{6, 


8, 


4} 


possible 


for s in {{0.582145, 0.810502}} 


{6, 


8, 


7} 


possible 


for s in {{0.582145, 0.853944}} 


{6, 


8, 


7, 


6} 


possible for s in {{0.582145, 0.853944}} 


{6, 


8, 


7, 


9} 


possible for s in {{0.582145, 0.818183}} 


{6, 


8, 


7, 


9, 


5} 


possible for s in {{0.582145, 0.811611}} 


{6, 


8, 


7, 


9, 


8} 


possible for s in {{0.582145, 0.818183}} 


{6, 


8, 


7, 


9, 


8, 


4} 


possible for s in {{0.582145, 0.818183}} 


{6, 


8, 


7, 


9, 


8, 


4, 


6} possible for s in {{0.763168, 0.818183}} 


{6, 


8, 


7, 


9, 


8, 


4, 


6, 2} possible for s in {{0.763168, 0.817831}} 


{6, 


8, 


7, 


9, 


8, 


7} 


possible for s in {{0.582145, 0.777853}} 
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{6, 8 


, 7, 


9, 8 


. 7, 


6} 


possible for s in {{0.582145, 0.777853}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


6, 


5} 


possible for s in {{0.589537, 0.777853}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


6, 


5, 


1} possible for s in {{0.589537, 0.777853}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


9} 


possible for s in {{0.582145, 0.751661}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


9, 


5} 


possible for s in {{0.582145, 0.751661}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


9, 


5, 


4} possible for s in {{0.582145, 0.751661}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


9, 


5, 


4, 3} possible for s in {{0.582145, 0.751661}} 


{6, 8 


, 7, 


9, 8 


, 7, 


9, 


8} 


possible for s in {{0.597869, 0.694824}} 


{6, 8 


, 7, 


9, 8 


, 7, 


9, 


8, 


4} possible for s in {{0.597869, 0.694824}} 


{6, 8 


, 7, 


9, 8 


, 7, 


9, 


8, 


4, 6} possible for s in {{0.597869, 0.694824}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


9, 


8, 


4, 6, 2} possible for s in {{0.597869, 0.694824}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


9, 


8, 


4, 6, 2, 5} possible for s in {{0.64094, 0.663616}} 


-C6, 8 


, 7, 


9, 8 


, 7, 


9, 


8, 


4, 6, 8} possible for s in {{0.64094, 0.663616}} 



All we need from this computation is that there is no possible sequence of 14 steps or more. In 
other words, following an inside contraction, the flatness will be greater than 1.01 times the original 
flatness after no more than 14 steps (including the initial contraction). 

Remarks about the list of possible sequences 

The remarks in this section are not needed for the proof, but they may give further insight into the 
behavior of the RNM algorithm as well as clear up some potential ambiguity about the computer 
output above. 

• That the sequence {4} is not possible (i.e., that an inside contraction with Aq as worst vertex 
immediately increases the flatness by at least a factor of 1.01) was shown already near the 
beginning of the proof of Proposition 13.151 

• The bound 40.0005 on \t\ and |u| need not be fed into the program, because the program 
automatically calculates stronger inequalities that are necessary for a contraction to occur. 

• Move sequences that do not appear in the list may still occur in actual runs of the RNM 
algorithm, but then the flatness must grow by more than a factor of 1.01. Similarly, a move 
sequence appearing in the list may occur while running the RNM algorithm even if s lies outside 
the given interval. For example, one can show that there exist triangles with < s < 0.582145 
on which the RNM algorithm takes move {6}. 

• One cannot predict from the list which step causes the flatness to grow beyond the factor 
of 1.01. For example, using our definition the sequence {6,2, 1,3,2} is possible (for a certain 
range of s), but the extended sequence {6, 2, 1, 3, 2, 1} is not. This should not be taken to mean 
that the last reflection {1} caused the increase in flatness, since reflections do not change the 
flatness (measured in the same coordinate frame). Rather, there may exist a triangle in the 
given range that for the objective function /(A, /it) = ^A 2 + /i will take the sequence of steps 
{6, 2, 1, 3, 2, 1}. What must be the case, however, is that for any such triangle the initial inside 
contraction {6} will have already increased the invariant by a factor at least 1.01. 

• One cannot deduce that in every run of the RNM algorithm, every sufficiently advanced se- 
quence of 14 steps involves a contraction. Experiments show that, when omitting any test for 
flatness, a sequence beginning with {6} can legitimately be followed by a very large number 
of reflect steps during which the flatness does not change. Thus we truly needed Lemma [3.101 
in addition to Proposition 13.151 to complete our proof. 



• The entire computation took about 11 minutes on an Intel Xeon 3.0 GHz processor. 
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